Access row/col data via attributes #3045

ghost · 2013-03-14T02:08:35Z

Supercedes ENH: Series item access via attribute #1903 #1904
Namespaces rows/cols as df.r.<row label>, df.c.<col label>
get/set, multindex in both axes
Designates df. (7531038) for future removal.
No perf hit on attribute lookup, if you don't use it (raised in ENH: Series item access via attribute #1903 #1904).

Open issues:

.r and .c (which i like), don't generalize to dim>2. Use a numbered scheme instead? (a0, a1,etc')
no panel support. No reason, I just never use them.
Setting columns has a caveat, need your numpy-fu help here (see bottom).

Demo:

In [1]: import pandas as pd
   ...: from pandas.util.testing import makeCustomDataframe as mkdf
   ...: 
   ...: pd.options.display.notebook_repr_html=False

In [2]: df=mkdf(4,2,r_idx_nlevels=1)

In [3]: df
Out[3]: 
C0      C_l0_g0 C_l0_g1
R0                     
R_l0_g0    R0C0    R0C1
R_l0_g1    R1C0    R1C1
R_l0_g2    R2C0    R2C1
R_l0_g3    R3C0    R3C1

In [4]: df.r.R_l0_g1 # tab-complete
Out[4]: 
C0
C_l0_g0    R1C0
C_l0_g1    R1C1
Name: R_l0_g1, dtype: object

In [5]: df.r.R_l0_g1 = df.r.R_l0_g0

In [6]: df
Out[6]: 
C0      C_l0_g0 C_l0_g1
R0                     
R_l0_g0    R0C0    R0C1
R_l0_g1    R0C0    R0C1
R_l0_g2    R2C0    R2C1
R_l0_g3    R3C0    R3C1

In [7]: df.c.C_l0_g1 # # tab-complete
Out[7]: 
R0
R_l0_g0    R0C1
R_l0_g1    R0C1
R_l0_g2    R2C1
R_l0_g3    R3C1
Name: C_l0_g1, dtype: object

In [8]: df.c.C_l0_g1 = df.c.C_l0_g0

In [9]: df
Out[9]: 
C0      C_l0_g0 C_l0_g1
R0                     
R_l0_g0    R0C0    R0C0
R_l0_g1    R0C0    R0C0
R_l0_g2    R2C0    R2C0
R_l0_g3    R3C0    R3C0

Multindex example (note recursive syntax):

In [1]: import pandas as pd
   ...: from pandas.util.testing import makeCustomDataframe as mkdf
   ...: 
   ...: pd.options.display.notebook_repr_html=False

In [2]: df=mkdf(4,2,r_idx_nlevels=2,c_idx_nlevels=2)

In [3]: df
Out[3]: 
C0              C_l0_g0 C_l0_g1
C1              C_l1_g0 C_l1_g1
R0      R1                     
R_l0_g0 R_l1_g0    R0C0    R0C1
R_l0_g1 R_l1_g1    R1C0    R1C1
R_l0_g2 R_l1_g2    R2C0    R2C1
R_l0_g3 R_l1_g3    R3C0    R3C1

In [4]: df.r.R_l0_g1.r.R_l1_g1 # tab-complete. twice
Out[4]: 
C0       C1     
C_l0_g0  C_l1_g0    R1C0
C_l0_g1  C_l1_g1    R1C1
Name: R_l1_g1, dtype: object

In [5]: df.c.C_l0_g0.c.C_l1_g0 # tab-complete. twice
Out[5]: 
R0       R1     
R_l0_g0  R_l1_g0    R0C0
R_l0_g1  R_l1_g1    R1C0
R_l0_g2  R_l1_g2    R2C0
R_l0_g3  R_l1_g3    R3C0
Name: C_l1_g0, dtype: object

In [6]: df=mkdf(4,2,r_idx_nlevels=2,c_idx_nlevels=2)

In [7]: df.c.C_l0_g0.c.C_l1_g0=df.c.C_l0_g1.c.C_l1_g1

In [8]: df
Out[8]: 
C0              C_l0_g0 C_l0_g1
C1              C_l1_g0 C_l1_g1
R0      R1                     
R_l0_g0 R_l1_g0    R0C1    R0C1
R_l0_g1 R_l1_g1    R1C1    R1C1
R_l0_g2 R_l1_g2    R2C1    R2C1
R_l0_g3 R_l1_g3    R3C1    R3C1

Suggest a fix for this? I'm sure I'm just lacking in numpy-fu here.

In [7]: df.r.R_l0_g0.r.R_l1_g0 = df.r.R_l0_g1.r.R_l1_g1 # this works
In [8]: df.r.R_l0_g0.r.R_l1_g0 = df.r.R_l0_g1 # this works
In [3]: df.c.C_l0_g0.c.C_l1_g0=df.c.C_l0_g1.c.C_l1_g1 # this works
In [5]: df.c.C_l0_g0.c.C_l1_g0=df.c.C_l0_g1 # this doesn't

ghost · 2013-03-14T02:11:44Z

designated 0.12, need to update release.txt beforee merging.

ghost · 2013-03-14T02:28:01Z

Add issue to remove df. in future release if this is merged
and removal has support.

wesm · 2013-03-15T03:09:44Z

I'm -1 on removing df.foo_col. Mainly because I find it really useful. The perf hit is acceptable

ghost · 2013-03-15T03:24:11Z

It wasn't the perf hit (is there any? getattr is a fallback) I was thinking of
but neatness.
I also find the mixing of object properties and data accessors in the same ns
messy, and so prefer seperating things out in a way that also treats cols
and rows on an equal footing.

I'd prefer closing this rather then introducing yet another similar but different
way to do the same thing which will coexist forever.

This wasn't meant as just a feature. it's a cleanup.

wesm · 2013-03-15T03:26:03Z

I'm for adding a general way to do what we're describing. The df. is very convenient and harmless enough to stay imho (plus it's familiar compared with record arrays). I'll take a closer look later

…c.<bar>

ghost · 2013-03-15T09:36:48Z

After more thought, df.colX is heavily used in bool indexing and would be bad to give up.
frames with labeled rows that are also identifiers (no spaces, punctuation) occur, but are
not that common, unlike column names.
This doesn't generalize well to other types of index, specificlly DateTimeIndex with Timestamp
labels that that begin with a digit (year).

Withdrawn.

ghost mentioned this pull request Mar 14, 2013

ENH: Series item access via attribute #1903 #1904

Closed

ghost mentioned this pull request Mar 14, 2013

Think about Series item access via attribute (a la DataFrame columns) #1903

Closed

y-p added 3 commits March 15, 2013 06:08

ENH: add attribute access to row/col in Series/df using x.r.<foo>, x.…

11622a0

…c.<bar>

TST: test data access via attributes using x.r, x.c

1cef94d

DOC: add attribute access to what's new in v0.11.0.txt

16bbb56

ghost closed this Mar 15, 2013

ghost deleted the feature/data_access_via_attrib branch December 20, 2013 15:58

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access row/col data via attributes #3045

Access row/col data via attributes #3045

ghost commented Mar 14, 2013

ghost commented Mar 14, 2013

ghost commented Mar 14, 2013

wesm commented Mar 15, 2013

ghost commented Mar 15, 2013

wesm commented Mar 15, 2013

ghost commented Mar 15, 2013

Access row/col data via attributes #3045

Access row/col data via attributes #3045

Conversation

ghost commented Mar 14, 2013

ghost commented Mar 14, 2013

ghost commented Mar 14, 2013

wesm commented Mar 15, 2013

ghost commented Mar 15, 2013

wesm commented Mar 15, 2013

ghost commented Mar 15, 2013